Data Integration and Pattern-Finding in Biological Sequence with TESS's Annotation Grammar and Extraction Language (AnGEL)
نویسندگان
چکیده
Decoding the functional elements in an organism’s genome requires the integration of a wide variety of experimental and computational data from a wide range of sources. The location of this data, viewed as sequence features in the genome, must serve as one of the essential organizing principles for this integration. It is therefore important to have a data integration system that takes advantage of this fact. As part of the TESS project, we have developed a grammar-based data integration and pattern search tool, Annotation Grammar and Extraction Language (AnGEL), that follows this principle. AnGEL can represent most of the current work in cis-regulatory module (CRM) modelling in an intuitive way and can process data extracted from a variety of sources simultaneously. Here we describe AnGEL’s capabilities and illustrate its use by querying for gene arrangements, CRMs, and protein domain structure.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملAn Exploration of Teachers' Beliefs about the Role of Grammar in Iranian High Schools and Private Language Institutes
This study was an attempt to explore the beliefs of Iranian EFL teachers about the role of grammar in English language teaching in both state schools and private language institutes. Data were collected through a questionnaire developed by Burgess and Etherington (2002), which consisted of 11 main subscales and was divided into two sections. The first section dealt with approaches to grammar te...
متن کاملAnnotation and Issues in Building an English Dependency Treebank
The Paninian Grammar framework, given by Panini for his analysis of Sanskrit Language, is finding its extensive application on languages other than Sanskrit, about two thousand five hundred years after its formulation. The work presented in this paper is one such application that extends Paninian Grammar (PG or CPG: Computational Paninian Grammar) to English, a fixed word order language. It pre...
متن کاملFinding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM
Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...
متن کاملA framework for traversing dense annotation lattices
Pattern matching, or querying, over annotations is a general purpose paradigm for inspecting, navigating, mining, and transforming annotation repositories—the common representation basis for modern pipelined text processing architectures. The open-ended nature of these architectures and expressiveness of feature structure-based annotation schemes account for the natural tendency of such annotat...
متن کامل